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Abstract — In a distributed storage system, the storage costs 
of different storage nodes, in general, can be different. How to 
store a file in a given set of storage nodes so as to minimize 
tlie total storage cost is investigated. By analyzing the min-cut 
constraints of the information flow graph, the feasible region of 
the storage capacities of the nodes can be determined. The storage 
cost minimization can then be reduced to a linear programming 
problem, which can be readily solved. Moreover, the tradeoff 
between storage cost and repair-bandwidth is established. 

I. Introduction 

Distributed storage system provides an elegant way for 
reliable data storage. The storage nodes are distributed across a 
wide geographical area. When a small subset of storage nodes 
encounters a disaster, the source data object can still be recon- 
structed from the surviving nodes. To keep the reliability of the 
distributed storage system above a certain level, redundancy 
is essential. Two strategies are widely employed to introduce 
redundancy. The most straightforward strategy is replication, 
in which each storage node stores an entire copy of the source 
data object. This method, though simple, has low storage 
efficiency. The other strategy is erasure coding, adopted in 
Oceanstore ||1] and Total Recall f2] systems. A source data 
object is divided into k equal size fragments, and then these 
k fragments would be encoded and distributed over n storage 
nodes; each node stores one encoded fragment. As a result, the 
source data object can be reconstructed from any k available 
storage nodes. Compared with the replication strategy, erasure 
coding provides better storage efficiency. However, in the 
face of repairing a failed storage node, erasure coding wastes 
bandwidth. This is because a newcomer has to first reconstruct 
the entire source data object by downloading data from any k 
surviving nodes and then to re-encode and store only a fraction 
of the downloaded data. 

In order to minimize the repair-bandwidth, Dimakis et al. 
in El, im propose the concept of regenerating codes. In their 
formulation, the data allocated to each storage node is equal to 
a units. When a node failure occurs, a newcomer chooses arbi- 
trarily d (d > k) available nodes to connect to and downloads 
/3 units of data from each of these d nodes. By introducing the 
information flow graph, they translate the repair problem into 
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a single-source multi-cast problem in network coding theory. 
A tradeoff between the storage capacity per node and repair- 
bandwidth is also established. In 15], a distributed storage 
system, in which different download costs are associated with 
storage nodes, is introduced. Specifically, the authors focus 
on the scenario that there are totally two sets of storage nodes 
according to the different download costs. A tradeoff between 
download cost and repair-bandwidth is identified. 

In most current studies of distributed storage systems, the 
amount of data stored on each node is simply assumed to 
be identical. How to distribute the data across a collection of 
storage nodes is not an easy problem. Given the total storage 
budget, for different access models, Leong et al. in (6\ try to 
find the corresponding optimal storage allocation, in the sense 
of maximizing the probability of successful data recovery. It 
is shown that symmetric allocation is not always an optimal 
solution. However, its model deals with only the recovery 
problem of source data object; the repair problem of failed 
nodes is not considered. 

In a realistic scenario, the storage nodes should be allowed 
to store different amounts of data according to the conditions 
of transmission links between source node and storage nodes 
as well as storage cost associated with each storage node. 
It is natural that different storage nodes may have different 
storage costs in a real distributed storage system. Since the 
storage nodes are distributed across a geographical wide area, 
the storage costs are affected by many factors, such as rents 
of the data storage centers, storage hardware costs and labor 
costs for maintenance. 

In this paper, we combine the storage allocation problems 
with repair problems, and take different storage costs into 
consideration. Our objective is to seek an optimal storage 
allocation, which minimizes the total storage cost, subject to 
the constraints obtained by analyzing the corresponding infor- 
mation flow graphs. More specifically, we focus on the case 
that there are totally two types of storage nodes, each having 
a different storage cost. We will show that our storage cost 
minimization problem can be solved as a Linear Programming 
(LP) problem. By identifying the feasible region of this LP 
problem, the minimum storage cost would be obtained at the 
corner points. Moreover, the tradeoff between the storage cost 
and repair-bandwidth can also be established. 

This paper is organized as follows. The problem of storage 



cost minimization is formulated in Section |II] In Section |III1 
we draw the information flow graph, and identify the min- 
cut constraints. In Section |IV] we characterize the minimum 
storage cost by a linear programming problem. In Section |V] 
we illustrate the tradeoff between storage cost and repair- 
bandwidth. We conclude in Section IVll 

II. Problem Formulation 

Consider a distributed storage system consisting of two 
types of storage nodes, each having a different storage cost 
per unit data. Let the storage cost for the first type of nodes 
be Ci, and the storage cost for the second type be C2. We 
assume that there are totally n storage nodes, among which 
rii nodes belong to type 1 and 77-2 nodes belong to type 2. A 
data object of size M units is encoded and distributed among 
the n storage nodes. For simplicity in presentation, we assume 
that the storage capacities of the nodes of type 1 are identical 
and equal to ai, while the storage capacities of type 2 nodes 
are identical and equal to a2. The total storage cost for storing 
the original data object can be calculated as Ciniai+C2n2a2- 

There are two components in the design of distributed 
storage systems: (i) A data collector (DC) connecting to any 
k available storage nodes should be able to reconstruct the 
original data object by downloading a number of packets 
from these k storage nodes, (ii) Once a storage node fails, 
a newcomer initializes a repair process and regenerates the 
failed node so that any DC, connecting to this newcomer and 
other k — 1 existing nodes, is able to rebuild the original data 
object. During the repair process, the newcomer chooses d 
(d > k) surviving storage nodes to connect to, each belongs 
either to type 1 or type 2, and then downloads /3 units of data 
from each of these d nodes. The traffic djB incurred by the 
repair operation is defined as the repair-bandwidth. 

There are two modes for storage-node repair. The first 
one is called functional repair and the second one is exact 
repair In functional repair, the content of the newcomer is 
not necessarily the same as the content in the failed node to 
be replaced. We only need to ensure that any DC connecting 
to any k storage nodes is able to rebuild the original data file. 
In exact repair, the content of the newcomer is required to be 
exactly the same as the content in the failed node. We refer 
the readers to Q, lE) for code construction for exact repair 
In this paper, we focus on functional repair. 

We model the distributed storage system as an information 
flow graph introduced in O, ID. For any information flow 
graph, to be detailed in the next section, if the minimum of 
the cut capacities between the source and each data collector 
is not less than the object data size M, then there always 
exists a linear network code such that all data collectors can 
reconstruct the data object [9|. 

Our objective of this work is to seek an optimal storage 
allocation across the n storage nodes that minimizes the total 
storage cost Cs under the constraints described above. 

III. MiN-CuT Constraints 

The distributed storage network with storage cost is ab- 
stracted and modeled by an information flow graph G — 




Stage -1 Stage Stage 1 

Fig. 1. Information Flow Graph (ni = n2 = 2, d = 3, fc = 2). 



(V,5). We label the storage nodes from 1 to n, so that the 
storage nodes 1 to ni are of type 1, while the storage nodes 
Til + 1 to 71 are of type 2. 

The vertices are divided into stages, starting from stage —1. 
In the i-th stage, we have one newcomer which replaces a 
failed node. The edges are directed, and labeled by the cor- 
responding capacities. We define the information flow graph 
more formally as follows. 

1) There is a single source vertex, S, in stage —1. It 
represents the data object to be distributed among the 
storage nodes. 

2) We put 2n vertices in stage 0. These vertices are called 
hi and Out^, for i = 1, 2, . . . , n. For each i, we draw a 
directed edge from the source vertex to In^ with infinite 
capacity. For z = 1, 2, . . . , ni, we draw a directed edge 
from hi to Out^ with capacity ai. This signifies that 
the storage capacities in the storage nodes of type 1 are 
limited to ai units. For i = ni + 1, rii + 2, . . . , n, we 
draw a directed edge from In^ to Out, with capacity 02. 
This indicates that each node of type 2 can store no 
more than 012 units of data. 

3) For s — 1,2,..., we put two vertices in stage s. If 
storage node i fails in the s-th stage, we construct 
two vertices, In^ and Outi in stage s. The vertex In^ 
is connected to d "Out" nodes in earlier stages. The 
capacities of these d edges are all equal to /?. If node i 
is of type j, (j is either 1 or 2) we draw an edge from 
hi to Outi with capacity Uj. 

4) A data collector is represented by a vertex, called DC, 
which is connected to k "Out" nodes with distinct 
subscripts. All these k edges have infinite capacity. 

An example of the information flow graph is shown in 

Fig.m 

A flow on the information flow graph G is an assignment 
of non-negative real numbers to the edges, satisfying the flow 
conservation constraints and the capacity constraints. A flow 
F can be regarded as a function from the edge set £ to the 
set of non-negative real numbers, F : £ —^ R+, such that 

(i) for each edge e £ £, F{e) is less than or equal to the 
capacity of e, and 

(ii) for each vertex other than the source vertex and the data 
collectors, the sum of incoming flows is equal to the sum of 



outgoing flows, i.e., if w G V is either an "in" or "out" vertex, 
then 

^ F{e)^ Y^ Fie) 

e:Head{e)—v e:Tail{e)—v 

where Head{e) and Tail{e) stand for the head and tail of 
edge e respectively. 

The value of a flow F with respect to a data collector DC 
is defined as the sum of incoming flows to this data collector. 



E ne). 



e:Head{e) = DC 

The maximal flow value with respect to a specific data 
collector DC, denoted by max-flow(DC), is the maximal value 
of flow to this data collector DC, over all legitimate flows. 
The max-flow theorem in network coding ^, flOj says that 
if max-flow(DC) > M for all data collector DC, then there 
exists a linear network code which sends M units of data to 
every data collector 

Given a particular data collector DC, an (S, DC)-cMf is a 
partition of the vertices (W, VV) such that S £ W and DC e 
VV. (Here VV stands for the set complement of W in V.) The 
capacity of an (S, DC)-cut is defined as the sum of capacities 
of the edges from W to VV. It is well known that the max-flow 
with respect to a data collector DC is equal to the minimum 
cut capacity. Let the capacity of an edge e be denoted by c(e). 
For each (S, DC) -cut , we have the following constraint 



E 

Tail{e)£W 
Head{e)<:W 



c(e) > M. 



(1) 



The summation in ([U is over all edges with heads in W and 
tails in VV. The storage cost minimization problem can be 
expressed as follows: 



minCs = CiUiai +C2ri2a2, 



(2) 



subject to the consti-aints ([B for aU (S, DC)-cuts (W,VV). 
The optimization is a linear programming problem with two 
variables ai and a2- 

Given parameters rii, 712, k, d, M, /?, Ci and C2, we let 
the minimum storage cost in the above linear program be Cg. 
The values of ai and 02 which achieve Cg are denoted by 
al and aj. We will also investigate the tradeoff between the 
storage cost and the repair-bandwidth. In this context, we will 
write Cg{l3), al{l3) and a2{P) as functions of /3. 

Theorem 1: Let A be the set of fc-vectors 

a = (a(l),a(2),...,a(fc)) 

whose components are either ai or a2, and the number of 
components in a which equal Oi is at most rii, for i = 1,2. 
Given ni, ^2, k, d and j3, the file size M is upper bounded 
by 



M <} min{Q!(i), (d - i + l)/3}, 



(3) 



for any a ^ A. Furthermore, we can construct an information 
flow graph such that equality in ^ holds for some a E A. 




Fig. 2. An example of cut (d=3, k=2). 



Proof: (sketch) The proof is based on the analysis of min- 
cut in the information flow graph, and is similar to the proof 
of im Lemma 2]. The main difference is that in this paper, the 
capacity of an edge between an "in" node and an "out" node 
may be either ai or a2, whereas in 2), all a's are identical. 
Because the number of storage nodes of type i is equal to n.i 
(i = 1,2), there are at most rii edges with capacity ai in a 
min-cut. Therefore we take the minimum only over the set A. 
As the proof of (O is basically the same as that of Lemma 2 
in |4|, the details are omitted. ■ 

We illustrate Theorem[T]by the example in Fig.[T] A sample 
cut (W, VV) is shown in Fig. |2l The vertices in VV are drawn 
in shaded color The values of a{l) and a(2) are either ai 
or a2. The set A consists of four pairs {ai,ai), (ai,a2), 
(a2, ai), and (02, a2)- The file size AI is upper bounded by 

M < min{ai, 3/3} + minjai, 2^} 
M < min{a2, 3/3} + minjai, 2^} 
M < min{ai, 3/3} + min{a2, 2/3} 
M < min{a2, 3/3} + min{a2, 2/3}. 

The cost minimization problem is to minimize Cs in ^, 
subject to the constraints in (O over all a E A. This 
optimization can be reduced to a linear programming problem, 
as shown in the next theorem. 

Theorem 2: Let 9,^ = {k - ■m)(2d - k ~ m + l)l3/2. The 
cost minimization problem is equivalent to minimizing Cs 
as defined in ^ subject to the following 2(fc + 1) linear 
constraints, 

M < min{TO, ni}ai + (m — niin{?Ti, ni})a2 + dm, (4) 
M < (m — niin{777,, n2})ai + min{?Ti, n2}a2 + dm, (5) 

for 771 = 0, 1, . . . , fc. 

Proof: For each a. E A, the inequality in (O can be 
replaced by 2'' linear inequalities. We introduce a "switch" 
function 



Sb{x,y) 



y ifb=l. 



Let B = {0,1}'^ be the set of all binary vectors of length 
k. The inequality in (O is equivalent to the following 2^^ 
inequalities: 

k 

M<Y,SbMi} Ad- 1 + 1)13), 



where (&i, 62, ... , bk) G B. This yields \A\2'' Unear inequali- 
ties. 

We may group these \A\2'' linear inequalities by the number 
of zeros in (&i, 62, ... , bk). Among those linear inequalities 
with m zeros in (61, 62, . • . , &fc), where m is an integer 
between and k, the most stringent inequality is the one 
associated with 



(&i, 62,-. ■,&fc) = (0,0,... ,0,1,1,...,!), 



m k — m 



which is. 



i—1 -i— rn+l 

771 

1=1 
If there arep ai's and g a2's among a(l), . . . ,a{m), we have 

Af < pai + qa2 + 9m- 

Among the group of linear inequalities with m zeros in 
(61, 62, ... , bk), many inequalities are redundant, meaning that 
we can remove them without altering the feasible region. We 
only retain two inequalities, the one in which the coefficient 
of ai is smallest, and the one in which the coefficient of 02 
is smallest, namely the inequalities in (|4]i and (|5]l. The other 
inequalities in the same group are some convex combinations 
of these two inequalities, and hence can be ignored without 
changing the shape of the feasible region. ■ 

If we put 771 = in either ^ or (|5]l, we see that there is 
no feasible solution to the linear programming problem if /3 
is strictly less than y2d-k+i) ■ From now on, we will assume 
that /? is no less than fe^^d-fc+i) ' 

IV. Storage Cost Minimization 

We solve the linear programming problem in Theorem|2]by 
considering four different cases: (A) ni > k and 712 > k, (B) 
ni > k and ^2 < k, (C) ni < k and 772 > k, and (D) ni < k 
and 712 < k. 

A. Case A: ni > k and n2 > k 

When both ni and 712 are larger than or equal to k, the two 
inequalities in (|4|i and (|5]l can be written as 

M < TTiai + 9,n, and (6) 

M<ma2 + em- (7) 

The region defined by these two inequalities is the intersection 
of two half-planes, which can be obtained by translating the 
first quadrant in the ai-a2 plane diagonally along the 45- 
degree line ai = a2. 

Theorem 3: For (3 > k(2d-k+i) ^ ^^ hcive 

at(/3) = a^(^) = max (M - 6'„0/™- 

l<m<k 

Proof: Taking all constraints ^ and d?), for m — 
1,2, ... ,k into consideration, the feasible region is in the form 



B 15- 



Feasible Region 




Fig. 3. An example of the feasible region in the linear program 

{(01,02) '. cti > jj, and a2 > /i}, where 11 is the maximum 
value as defined in the theorem. No matter what the costs 
Ci and C2 are, (provided that they are positive) the optimal 
solution to the linear programming is at the corner point of 
the feasible region, namely (a*, 02) — (MiA*)- ■ 

In the case where rii and 7i2 are both larger than or equal 
to k, we see that the optimal storage allocation is to put the 
same amount of data in both type 1 and type 2 nodes. The 
storage costs of the two types of nodes do not matter. 

B. Case B: ni > k and n2 < k 

For m = 1,2, ... ,k, the two inequalities in (|4]i and ^ can 
be written as 

max > M — 9„i, 
{m - qm)ai + qma.2 > M - 9,n, 

where q^ — min{777, 712}. These two inequalities define an 
infinite polyhedral region. For to = 1, 2, . . . , fc, let TZ^ be the 
region 

'R-m -{(ai,a2) e K+ : mai > M - 6',„, 
{m - qrn)ai + g,„a2 > M - 6^}, 

The feasible region of the linear program is thus the inter- 
section of TZi, 7^2, . . . , Ti-k- The corner point of the region 
TZm can be obtained by solving the two equations obtained 
by setting the inequalities to equalities, and has coordinates 

ai = a2 = {AI - 9m) /m. 

In other words, for 777 — 1,2, ... ,k, the corner point of TZm 
lies on the line ai = a2 in the ai-a2 plane. 

An example of the feasible region is shown in Fig. [5] The 
horizontal and the vertical axes are ai and a2 respectively. 
The parameters of the distributed storage system are ni = 8, 
772 = 2, d = 8, fc = 6, Af = 66 and (3 = 3.3. The region to 
the right and above all lines is the feasible region. The dashed 
line indicates the 45-degree line ai ~ 0,2- The optimal point 
is one of the vertices of the feasible region. The choice of the 
vertex which minimizes the storage cost depends on the ratio 
Ci7ii/(C27i2), i.e., the slope of the objective function. 
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Fig. 4. Storage Cost and repair-bandwidth Tradeoff, Ci = 1. 



We can observe from Fig. [3] that if the cost Ci is much 
greater than C2, then the optimal point always lies on the line 
ai = a2, i.e., a*(/3) = ajC/^) for ^^ Z^- 

Case C is similar to Case B. The feasible region of case C 
can be regarded as the mirror image of the feasible region of 
case B with respect to the line ai = a2. We therefore skip 
the discussion on Case C. 

C. Case D: ni < k and n2 < k 

The feasible region of the linear program in Theorem |2] is 
bounded by 



PmCtl + (m - Pm)a2 >M-\ 
(m - qm)ai + qinCt2 > M ~\ 



in 1 
in 1 



for m = 1,2, ... ,k, where q,„ is defined as in the previous 
section and p„j = min{r7i, rii}. The feasible region is the 
intersection of 

■^m ={(ai,a2) e IR+ : Pm"! + {m-pm)ct2 > M - Om, 
(m - qm)ai + g,„a2 > M - 9^} 

for m — l,2,...,k. As in Case B, we can show that for 
m = 1, 2, . . . , fc, the vertex of the polyhedral region TZm lies 
on the line ai = a2 in the ai-a2 plane. 

V. Tradeoff between Storage Cost and 
Repair-Bandwidth 

Explicit formulae for a*(/3), 0^2 (/^) ^^'^ ^sW ^^^ ^^ 
found, but due to space limitations, we do not type the 
formulae in this paper 

To illustrate the tradeoff between storage cost and repair- 
bandwidth, we consider a distributed storage system with 
parameters used in Fig. ^ rii — 8, n2 = 2, d = 8, k = 6, 
M = 66. The minimum repair-bandwidth is 2Md/{k{2d — 
fc + 1)) = 16. We fix the cost Ci for the storage nodes of type 
1 to be 1, and increase C2 from 0.2 to 1.8, with step size 0.4. 
For each value of C2 we plot Cg{P) for dfi from 16 to 32. 
The resulting curves are shown in Fig. 2] The curve in the 
middle corresponds to Ci == C2 = 1. This reduces to the case 
in [|4J where the costs of both types of nodes are the same. 



VI. Conclusion 

In this paper, we aim at seeking an optimal storage allo- 
cation that minimizes the storage cost in distributed storage 
systems. Specifically, we focus on the network with two types 
of storage nodes, each having a different storage cost. We 
demonstrate that the minimization problem can be solved as 
a linear programming problem. It is shown that the feasible 
region can be determined by analyzing the min-cut constraints 
of the corresponding information flow graph. The minimum 
storage cost can be achieved at the corner points. Moreover, 
the tradeoff between the storage cost and repair-bandwidth 
is established. Our method can be extended to more general 
cases, in which the storage costs of all storage nodes are not 
the same. 

We can implement coding scheme and repair protocol 
for distributed storage system with storage cost by using 
random linear network coding over a finite field. The packets 
transmitted from a surviving storage node to the newcomer 
are a linear combination of the data in the memory of the 
surviving storage node. If we apply existing code construction 
methods from linear network coding to distributed storage 
system, the required finite field size may be unbounded. It 
is because the finite field size requirement is a monotonically 
increasing function of the number of data collectors, which 
may be unbounded. To make sure that the regeneration process 
will be successful after arbitrarily many stages of repairs, it is 
important to show that the finite field size requirement is upper 
bounded by some constant. How to construct linear network 
code for distributed storage system with storage cost is an 
interesting direction for future studies. 
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